Stochastic gain in population dynamics 
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We introduce an extension of the usual replicator dynamics to adaptive learning rates. We show that a popu- 
lation with a dynamic learning rate can gain an increased average payoff in transient phases and can also exploit 
external noise, leading the system away from the Nash equilibrium, in a reasonance-like fashion. The payoff 
versus noise curve resembles the signal to noise ratio curve in stochastic resonance. Seen in this broad context, 
we introduce another mechanism that exploits fluctuations in order to improve properties of the system. Such a 
mechanism could be of particular interest in economic systems. 
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Game theory [ij] describes situations in which the success 
or payoff of an individual depends on its own action as well 
as on the actions of others. This paradigm can be applied 
to biological systems, as evolution through natural selection 
can be viewed as an optimization process in which the fitness- 
landscape changes with the state of the adaptive populations 
101 • Evolutionary game theory focuses mainly on systems 
with a single fitness function for all individuals, which is iden- 
tified with the payoff function of a game |3, 4, 5J. In nature 
often different populations with different ambitions interact 
with each other, as shoppers and sellers [6], attackers and de- 
fenders |6], or males and females |5]. Here, the payoff func- 
tions are different for the interacting populations. A mean- 
field description of such asymmetric conflicts is given by the 
coupled replicator equations J, JJ. These equations have a 
very rich dynamical behavior and can even display Hamilto- 
nian chaos In previous work f^, '7, ^] it has been tac- 

itly assumed that both populations have the same adaptation 
mechanisms. But it seems to be natural that different mecha- 
nisms are applied by the interacting populations, e.g. different 
adaptation rates. Here, we analyze such systems for the case 
that both populations have slightly different adaptation mech- 
anisms. We assume that one population can control its own 
adaptation rate. This alters the velocity when the system is 
approaching the stable Nash equilibria |10j in strategy space, 
leading to an increased average payoff. 

In real systems fluctuations disturbing the system are to be 
expected. Such disturbances can arise from a variety of ef- 
fects, e.g. errors of the players lHHl . deviations from a per- 
fectly mixed population, or immigration of individuals with 
different strategy distributions. So far, stochastic extensions 
to the replicator dynamics have mainly been analyzed in the 
context of equilibrium selection 1 12, 13]. Here, we show that a 
population with adaptive learning rate can obtain an increased 
payoff if these fluctuations are present. For small noise inten- 
sities the average payoff increases, while very large fluctua- 
tions cannot longer be exploited, leading to a decrease of the 
average payoff. This recalls the stochastic resonance effect 
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where the signal to noise ratio of a system 
is improved for intermediate noise intensities. In contrast to 
the usual stochastic resonance, a periodic force is not involved 
here, making the mechanism more similar to coherence reso- 
nance fis'l . Seen in this broader context, we introduce another 
mechanism that exploits fluctuations in order to improve the 
performance of the system. 

We consider two adaptive species X and Y — each with dif- 
ferent strategies — that are involved in a repeated game. Both 
populations have different objectives described by payoff ma- 
trices P.J., and Py. The fraction of individuals Xi that adopt a 
certain strategy i grows proportional to the relative payoff of 
the strategy i, the same holds for Y. In the presence of noise 
this coevolution can be described by the coupled replicator 
equations, 

X, = x.ry, [nf - (n-)] + (1) 
= y,^,[nr-(n^>] + er, 

where rj^ and r/y are the learning rates of the populations. We 
assume for simplicity that the noise is Gaussian with auto- 
coii-elation (^(i)^j(s)) = a^SijSkiS{t - s) as in ii2|. We 
also follow jl2ll choosing reflecting boundaries. The payoffs 
are defined as Hf = {P^ ■ y),j, (11^) = ■ P^ ■ y, and simi- 
larly for y. 

We extend the usual replicator dynamics by introducing 
adaptive learning rates as 

rjx tanh {a^AU) , (2) 

where AH = (11^) — (11^) is the time dependent difference 
between the average payoffs of the populations and a^; > is 
a "perception ability" of the population. In order to maintain 
the basic features of the replicator dynamics, the learning rate 
must be a positive function with (?/) = 1, which is ensured 
by Eq. 0. For > the population X learns slower if 
it is cuiTently in a good position, otherwise it learns faster. 
The value of determines how well a population can assess 
its current state. The adaptive learning rate leads to a faster 
escape from unfavourable states, while on the other hand the 
population tends to remain in preferable states. Other choices 
for which ensure these properties mentioned above will not 
alter our results. In the following we will focus on a setting 



where only one population has an adaptive learning rate rix as 
in Eq. (|2ji. 

The noise introduced above drives the system away from 
the Nash equilibrium and leads for small amplitude to a posi- 
tive gain of the population with adaptive learning rate whereas 
for large noise amplitudes the fluctuations smear out the tra- 
jectories in phase space so strongly that they can no longer be 
exploited. Hence, we expect an optimal noise effect for inter- 
mediate values of a. In order to be able to compare the payoffs 
of both populations we assume that the dynamics starts from 
the Nash equilibrium. 

As a first exarnple, we consider the zero sum game "match- 
ing pennies" 1 31 11911 . Here, both players can choose between 
two options ±1. Player one wins if both players select the 
same option and player two wins otherwise. The game is de- 
scribed by the payoff matrices 

- {1\ l\) = -Py (3) 
The replicator equations follow from Eqs. Q and (|3} as 

X = -2i]xx{2y - l){x - 1) + 

V = +277,y(2a:-l)(y-l) + e,„ (4) 

where x — xq and y = yo. Let us first consider the zero noise 
limit in the case rj^ = rjy = 1. As for all zero-sum games, i.e. 
Px = —Py, the system ([0 without noise becomes Hamilto- 
nian and has a constant of motion iiflll . Here, the constant is 
given by H{x, y) — —2 In [x{l — x)] — 2 In [y{l — y)]. The 
trajectories oscillate around the Nash equilibrium aix ~ y — 
1/2. H{x, y) is connected to the temporal integral of the av- 
erage payoff (JVx) = (x*)^ ■ Px ■ during a period with 

(n^) > 0, 

r(n^)d.^-^ii^lil^^iMl, (5) 

J to ^ 

where {x,y) = {xq, \) at to and {x,y) = (i,a;o) atti. 

If we include adaptive learning rates (|2) into the system, 
we find H{x,y) ~ — 2 tanh(Q!2^An)An < 0, vanishing for 
ax = 0. Hence, adaptive learning rates dampen the oscil- 
lations around the Nash equilibrium and the trajectories in 
the X — y plane spiral towards the Nash equilibrium where 
(Hx) = (Hy) = 0, see Fig.^ In addition, this leads to an in- 
creased payoff of one population. As the matrices ^ describe 
a zero sum game it is sufficient for a population if it knows its 
own current average payoff (AH) — 2{Ilx)- 

Numerical simulations for > show that the temporal 
integral of the payoff becomes 

( / \Ux)dt)^xo,yo) = ~l {H{xi,yi) - Hixo^yo)) . (6) 

The averaged initial value H{xo,yo) can be calculated as 
jjjj dxodyoH{xQ,yo) = 8. For t oo the system re- 
laxes to the Nash equilibrium where H = 8 In 2. Hence, we 




FIG. 1: Matching pennies: Comparison between the behavior of a 
population with constant learning rate, i.e. ax = 0, (thin lines) and a 
population with adaptive learning rate (perception ability Ox ~ 10, 
thick lines). The opponent has in both cases a constant learning rate 
rjy — 1. Left: Trajectories in strategy space. Arrows show the vector 
field of the replicator dynamics. Population X has positive (negative) 
average payoff in gray (white) areas. Right: Time development of 
the average payoff of the population X. The adaptive learning rate 
increases the time intervals in which the corresponding population 
has a positive payoff, dampening the oscillations around the Nash 
equilibrium 1,2211 . 

find for the average cumulated payoff {f^^ {nx)dt) (^xo,yo) — 
-■i(81n2-8) w 0.307. Numerical simulations yield 
0.308 ± 0.005 independent of a. We conclude that a pop- 
ulation can increase its average payoff if it has an adaptive 
learning rate a^; > and if the game does not start in the Nash 
equilibrium. The adaptation parameter a influences only the 
time scale on which the Nash equilibrium is approached. 

Small noise intensities drive the system away from the 
fixed point and the population with the adaptive learning rate 
gains an increased payoff. If the noise amplitude a becomes 
too large the trajectories will be smeared out homogeneously 
over the positive (gray) and negative (white) payoff regions 
in phase space (Fig. 0. This implies that the average gain 
of population one decreases to zero. Although the average 
payoff is very small even for the optimal noise intensity, the 
cumulated payoff increases linearly in time. This means that 
for long times the gained payoff accumulates to a profitable 
value. 

As a second application we analyze the effect of adaptive 
learning rates and noise on the prisoner's dilemma. We use 
the standard payoff matrix l2lll . 

p^-{H)-p. 

where rows and columns are placed in the order "cooperate", 
"defect". As this game is not a zero sum game, the population 
with the adaptive learning rate must be able to compare its 
own average payoff with the opponent's average payoff. The 
replicator dynamics of this system is determined by Eqs. Q 
and Q, 

X = xr]x{x - + y) + (8) 
y = 2/»7y(2/-l)(l + a:^)+Cy 



2 




1.0 
0.1 



FIG. 2: Matching pennies: Average payoff of a population with adap- 
tive learning rate against a population with constant learning rate un- 
der the influence of noise for different noise intensities (ay = 0, 
averages over 2 x 10* initial conditions and 2 x 10* time steps, see 
l23l for further details). 



There is a stable fixed point in the Nash equilibrium x = y = 
where both players defect and an unstable fixed point for 
mutual cooperation, i.e., x — y = 1. 

The average payoff difference under the influence of noise 
is similar as in matching pennies. Small fluctuations lead the 
system slowly away from the Nash equilibrium and tend to 
increase the payoff. If the fluctuations are too large they dis- 
turb population with adaptive learning rates and the payoff de- 
creases again, see Fig.|3l Interestingly enough, here too much 
noise even leads to a decreasing payoff difference. 
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FIG. 3: Prisoner's dilemma: Average payoff difference of a popu- 
lation with adaptive learning rate against a population with constant 
learning rate for different noise intensities. The negative payoffs arise 
from the fact that we have rjx < rjy for x < y (At — 0.01, Oy = 0, 
averages over 2 x 10* initial conditions and 2 x 10* time steps). 

In order to describe the "stochastic gain" effect analyti- 
cally we introduce a simplified model. A linearization of 
Eq. (|8} around the stable Nash equilibrium leads for con- 
stant learning rates to i = —rjxX + (,x and y — —rjyy + (^y. 
We now analyze a game in which the replicator dynamics is 
given by these linear equations and include adaptive learning 
rates based on the payoffs for the prisoner's dilemma. With 
An = —5{x — y) the adaptive learning rate r]x becomes 
rjx = I + tanh(5Q:(a; — y)) « 1 + ba{x — y) for a,x,y <^ 1. 
The simplified system can be viewed as a small noise ex- 



pansion of the prisoner's dilemma, where the trajectory stays 
close to the Nash equilibrium. For rjy — 1 the simplified noisy 
replicator equations read 



X — —x — a'x{x — y) + £,x 

y = -y + S.v, 



(9a) 
(9b) 



where a' — 5a. The effect of different constant learning rates 
is discussed in lEsll . The mechanism we introduce here is 
more intricated, as the adaptive learning rate leads to a dy- 
namical adjustment of the learning rate and the average of 
rjx ~ 1 + a' {x — y) over all possible strategies is r/y ~ 1. 

Equation (|9b} describes an Ornstein-Uhlenbeck process 
|24|, here the dynamics is restricted to < y < 1. The 
Fokker-Planck equation 1I25I1 for py — py{y,t\yo,to). 



Py 



d_ 

dy 



(10) 



has the stationary solution ~ Nye J*'/""^, where Ny ^ — 
/(, e-y I" dy. We find flie mean value {y{(j)) as 



(y) 



dypyy 



V^Erf(i) 



(11) 



y is a correlated stochastic process which appears in Eq. ( l9ab 
as a multiplicative noise. Numerical simulations indicate that 
we may neglect the stochastic nature of y and replace it by 
{y) for small a. This leads to an approximated Fokker-Planck 
equation forp^^ ~ Pxix, t\xo,0) 



Px 



d_ 

dx 



-a{x)px 



2 dx 



Px 



(12) 



where a{x) = —x — xa' {x — {y)). Since x is (similarly to y) 
also restricted to < a; < 1 we find the stationary solution 



pI = Nx exp 



'^3 



2a'x 



(13) 



with the normalization constant Mx- Since x is typically of 
the order of a for cr ^ 1 the term x^ ja^ is finite. Therefore 
we can expand Eq. ( I13l l for a' ^ 1 and obtain expanding (x) 
again an analytical expression for (AH) = — 5((a;) — (y)) 



(An) = -hol^ix) =^ol 
da' 



^-<5V7(l-7r 



(14) 



where S 



and 7 



The asymptotics of 



^El-l(l/cr) 

Eq. (US can be computed as (An) = a'/ (24a^) for ct > 1 
and (An) = a' (| - |f ) cr^ for ct < 1. We stress that this 
simplified system which consists of a stable fixed point with 
linear adaptive learning rate in the presence of noise is the 
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FIG. 4: Simplified model: Comparison of the average payoff differ- 
ence (An) from a simulation of Eqs. <9al9b> and the analytical func- 
tion Eq. (T4j (At = 0.01, a' = 5a ^ 0.1, averages over 4 x 10* 
time steps and 4 x 10* realizations). 

simplest possible model that describes the stochastic gain ef- 
fect. Fig.|3]shows a comparison between the analytical payoff 
difference Eq. il4\ and a simulation of Eqs. ( I9al9bl i. 

To summarize, we have introduced an extension to the usual 
replicator dynamics that modifies the the learning rates using 
a simple "win stay — lose shift" rule. In this way, a popu- 
lation optimizes the payoff difference to a competing popu- 
lation. This simple rule leads to a convergence towards the 
mixed Nash equilibrium for the game of "matching pennies" 
il^ . Even in games with stable Nash equilibria as the "pris- 
oner's dilemma" transient phases can be exploited, although 
the basins of attraction are not altered, as e.g. in |23]. Weak 
external noise drives the system into the transient regime and 
leads to an increased gain for one adaptive population. 

In conclusion, we have found a learning process which 
improves the gain of the population with adaptive learning 
rate under the influence of external noise. Fluctuations lead 
to an increased payoff for intermediate noise intensities in a 
resonance-Uke fashion. This phenomenon could be of par- 
ticular interest in economics, where interactions are always 
subject to external disturbances 
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